LETTER CHANGE BIAS AND LOCAL UNIQUENESS IN OPTIMAL 

SEQUENCE ALIGNMENTS 
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Abstract. Considering two optimally aligned random sequences, we investigate the effect on the 
alignment score caused by changing a random letter in one of the two sequences. Using this idea in 

1JL " conjunction with large deviations theory, we show that in alignments with a low proportion of gaps 

the optimal alignment is locally unique in most places with high probability. This has implications 
in the design of recently pioneered alignment methods that use the local uniqueness as a homology 

^■"^ ■ indicator. 
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/\i ■ 1. Introduction. The purpose of this paper is to gain insight into the local mul- 

tiplicity of optimal alignments of two random binary sequences. Before introducing 
the problem setting, let us give a brief motivation and literature review. 
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1.1. Motivation. A fairly general and useful technique to identify high quality 
alignments of two sequences x ='iCi. . . Xm and y ='j/i. . . y n ' with characters from a 
finite alphabet A is to consider alignments with gaps U, 
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U x\ Xi x 3 U 
2/i V2 U y 3 V4 

and to quantify the quality of such an alignment with a score of the form 

S{x,y) = s(U,yi) + s(xi,y 2 ) + s(x 2 ,U) + s(x 3 ,y 3 ) + s(U,y 4 ). (1.1) 

cn ; 

When a scoring function of the form (II. ip is used, a good choice of individual scores 
s(a, b) of matched symbols a and b and the choice of the alphabet A depend on the 
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specific application one has in mind. The matching of any symbol a <E A with a gap 
U is typically penalized by a negative score term s(a, U), s(l_l,a) < 0. 

The simplest similarity measure of the form (|1.1[) arises as the length of a longest 
common subsequence (LCS). A common subsequence is any sequence that can be 
obtained by deleting some characters of either sequence and keeping the remaining 
ones in the original order. The length of a longest common subsequence of two 
sequences x and y is the same as the score S(x,y), where individual scoring terms are 
defined as follows, 

if a = b^ U, 
s(a, b) = •{ 00 if a ^ b and a, b 7^ U, 

if a = U or b = U, but not both. 

Sequence alignment techniques play an important role in biology (see e.g. (21|), 
speech recognition, pattern recognition, automated translation and other areas where 
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hidden Markov models are used as an analytic tool. The study of optimal alignments 
of random sequences is also interesting to statistical physicists, because it can be 
seen as a first passage percolation problem on an oriented graph with correlated 
weights. First passage percolation is a mature field and has been a major research 
area in discrete probability for several decades. An excellent overview can be found 
in the chapter dedicated to first passage percolation of Volume 110 of the Springer 
Encyclopaedia of Mathematical Sciences. 

Among several long-standing open questions in this field are the problems of 
identifying the exact order of the fluctuations and the proportion of points where the 
optimal alignment is unique. Recently, significant progress has been made on both 
questions: In several special cases it was shown that a positive bias effect of a random 
letter-change on the optimal alignment score of two random sequences of length n 
exists and implies an order of fluctuation proportional to y/n, see [8] , |17] , |15] , |16] . 
In [3] it was shown how to apply this principle to arbitrary scoring functions, and 
how to detect the positive bias effect via a nontrivial Montecarlo technique. In [13], 
a case study was conducted on using the local uniqueness of optimal alignments to 
detect the homology of two DNA sequences. The motivation behind this method 
is the empirical observation that all optimal alignments of two biologically related 
sequences are identical in most places, while optimal alignments of unrelated sequences 
are locally nonunique in most places. 

Our paper concerns a theoretical study of this last observation. Considering 
two independent random sequences with i.i.d. characters, we examine their optimal 
alignments containing a fixed proportion of gaps and prove that when the proportion 
of gaps is small, then with high probability optimal alignments differ only in a small 
number of places and are locally unique everywhere else. Our result implies that the 
approach of [13] can only work for scoring functions in which gaps are not penalized 
too heavily, as this would force the number of gaps appearing in optimal alignments 
to be small relative to the length of the two sequences. 

Optimal alignments of random sequences are often used as null-models in statis- 
tical tests to decide on whether two or more given sequences are homologous. The 
mathematical underpinnings are best understood in the context of the LCS setting. 
Let L n denote the length of the LCS of two independent binary i.i.d. sequences of 
length n. Using a subadditivity argument, Chvatal-Sankoff [9 showed that the limit 

,. E[L n ] 
7 := nm 



exists. Determining the exact value of 7 - the so-called Chvatal-Sankoff constant - is 
a long standing open problem. However, upper and lower bounds are known, see [5], 

Another long open problem is the determination of the exact order of the fluc- 
tuation of the LCS score as a function of the length of the sequences. Consider- 
ing the case of binary sequences obtained by flipping perfect coins, it was shown in 
[20] that VAR[L n ] < n. Montecarlo simulations in [9] led to the conjecture that 
VAR[L n ] — o(n 2 / 3 ). This order of magnitude is similar to the order for the so-called 
longest increasing subsequence (LIS) of random permutations, see [7J and [1]). The 
LIS setting is asymptotically equivalent to first passage percolation on a oriented Pois- 
son random graph. In [221 it was conjectured that in many cases the variance of L n 
grows linearly. This seems indeed to be the case generically [3], but there may exist 
different orders of magnitude for these fluctuations, depending on the distribution of 
the sequences X and Y, see [5], [17], 0]- 
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1.2. Problem Setting and Key Ideas. Let us consider the set of optimal LCS 
alignments of the two sequences 'I-do- not-like-symmetry' and 'I-detest-symmetry'. If 
we were to give an exhaustive list of optimal alignments, we would find that all of 
them are of the form 
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that is, all optimal LCS alignments agree in the region outside of the places filled with 
wild card stars *. We say that in this region the optimal alignment is locally unique. 
In our example the optimal LCS alignment is locally unique on a large proportion of 
the two sequences. 

This observation is typical not only in the LCS setting, but for general scoring 
functions S(x,y) as introduced in (jl.lj) : When two sequences are closely related to 
one another, the optimal alignments are locally unique in many places. Conversely, 
the optimal alignments of two random sequences with i.i.d. entries often are locally 
unique only in very few places. |13j exploited this observation in the design of a 
homology detecting algorithm for DNA sequences. However, as this paper will reveal, 
for this method to work one has to select the gap penalty s(a, U) quite carefully: 
When gaps are strongly penalized, then no more than a constant proportion of gaps 
are observed in optimal alignments, and if the proportion of gaps is small then the 
optimal alignment is locally unique in most places even for i.i.d. random (and hence 
totally unrelated) sequences. 

Our paper concerns a theoretical analysis of this phenomenon. To make the 
analysis transparent, we chose a simplified setup in which X — X\ . . . X m and Y = 
Y\ . . . Y n are random sequences consisting of i.i.d. standard Bernoulli variables, where 
m = [(1 — S)n\ depends on n via a fixed gap proportion 6. We then investigate 
alignments of X and Y that contain gaps only in X, and we use a scoring function 
in which matching symbols contribute to the total score with unit weight and non- 
matching ones with zero weight. Our interest is in the random number U < m of 
indices i for which Xi is aligned with more than one Yj under the different optimal 
alignments. The main theorem of this paper shows that when 5 is small and n is large, 
optimal alignments are locally unique in an arbitrarily large proportion of places with 
arbitrarily high probability: 

Theorem 1.1. For all e > there exists Sq > and no G N such that for all 
5 e (0, So) and n > n , P [U > me] < e. 

While it is clear that P[U > me] = when S — for any n, the theorem shows 
the nontrivial fact that the limit lim„_ >00 P[[/ > me] is continuous in 6 at 8 = 0. 
Furthermore, the proof provides the quantitative estimate 

P[U> me] < V ' . + O (c- nS ) . 

Theorem 11.11 is also very interesting in the context of the Chvatal-Sankoff con- 
jecture which concerns the order of magnitude of the fluctuation of the LCS of two 
random texts. 

We recall the assumptions under which we prove our main result and which will 
remain valid throughout the rest of this paper: Let n£N and let < S < 1 be a fixed 
constant not depending on n. We set m = [n— Sn\ and define two independent random 
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sequences X = Xx . , . X m and Y = Yj. . . . Y n by choosing X x , . . . , X m and Y\ , . . . , Y n 
as i.i.d. Bernoulli variables with parameter 1/2 (i.e., coin tossing experiments). We 
then consider alignments 

U ... U X x U ... U X m U ... U 

Y ... %1)-1 Yf(l) *£(1)+1 • • ■ %m)-l 5^(m) *£(m)+l • • ' Y n 

with gaps in X only and attribute to it the score 

s(A-,y ; o = #{*eN m : x< = y cw }, 

where # denotes the cardinality of a set and N„ := {1, . . . , n}. The set of alignments 
£ that maximize 5*(X, Y; £) is denoted by C?.Ax,y- Of course, this is a random set of 
alignments, since X and Y are random. We write S*(x, y) :— max^ S(X, Y; £) for the 
maximum score and 

17 := #{z G N m : 3£, A G OA^y s.t. £(») ^ A(i)} 

for the number of positions where X is not uniquely aligned with Y among the align- 
ments with maximum score. U is a random variable. 

Thcorem ll.ll states that P [U > me] < e for all n large enough and 6 small enough. 
In other words, for large n and small gap proportion 8 the optimal alignment is 
locally unique in a (1 — e)-proportion of the sequence X with probability greater 
than 1 — e. This result is qualitatively representative for what occurs with regards 
to the local uniqueness of optimal alignments of random sequences under arbitrary 
scoring functions S(X, Y) as defined in (|l.l|l whenever gaps are strongly penalized, 
i.e., s(a, U) is a negative number of not too small a modulus. Indeed, strong gap 
penalization prevents optimal alignments from having more than a small proportion 
of gaps. Furthermore, allowing gaps only in one of the sequences is merely a technical 
assumption that vastly simplifies the analysis. 

Let us now briefly explain the main idea behind the proof of Theorem 11.11 We 
define a measure preserving map by picking an entry of X at random and flipping 
it to the "opposite" value, i.e., a is changed to a 1 or vice versa (we imagine X 
as a line-up of randomly tossed fair coins). We denote the sequence obtained in this 
fashion by X. Since this operation is measure preserving, we have 

E[A]=0, (1.2) 

where A := S* (X, Y) — S* (X, Y). A crucial observation is now that when the optimal 
alignment is nonunique in a large proportion of places, then the optimal score tends 
to increase under this measure-preserving map. We illustrate this phenomenon in 
Example 11.21 below. Together with 11.21 this observation implies that the probability 
that the optimal alignment is nonunique in many points is small. 

Example 1.2. Consider the case where n = 8. m = 6, S = 1/4 and X and Y 
take the values x = 001110 and y = 11110011. There are two optimal alignments, £ 
given by £(i) = i for (i = 1, . . . , 6) or 

1 1 1 U U 
11110 11, 

and A given by A(i) = i for (i = 1, . . . , 4) and A(5) = 7, A(6) — 8 or 

1 1 U U 1 
11110 11. 
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The optimal score is S*(x,y) = S(x,y;£) = S(x,y;X) = 3. 

The following combinatorial property holds for arbitrary alignments £, A of x and 
y, not only optimal ones: If i € {1, . . . , m} is such that 

then flipping Xi to the opposite value increases at least one of the scores S(x,y;£), 
S(x,y;X) by one unit. In particular, if £ and A are both optimal alignments and 
condition (|1.3[) holds, then flipping Xi to the opposite value increases the optimal 
score by one unit. For the chosen values of x and y we find that i — 5,6 satisfy 
condition (|1.3[) . Flipping the value of x$ from 1 to 0, we find that the score of £ 
increases to 4 and the score of A decreases to 2. The maximum score is now 4. 
Similarly, flipping xq from to 1, the score of £ decreases to 2 whereas the score of 
A increases to 4. The new maximum score is again 4. If an entry xt of x is flipped, 
where T is a random index in N m , then this implies 

E[A||X = x,Y = y,{(T) ± A(T),y ?(T) ± Y X{T) ] = 1. (1.4) 



On the other hand, if one of the entries x\,...,x^ that do not satisfy (|1.3|) is 
flipped, then the maximum score can either increase or decrease: For i = 1 . . . , 4, we 
have £(i) = \(i). The entries x\ and X2 are aligned with non-matching symbols, so 
that flipping one of these entries increases the optimal score by one. The entries x$ 
and X4 are aligned with matching symbols. In the present example, switching one of 
these entries results in a decrease of the optimal score by one unit, though in other 
cases the maximum score can remain unchanged (but it will then be attained by a 
different alignment). Thus, we find that if a random entry xt of x is flipped (where 
T <G N m ) then for the above choices of x and y, 

E[A||X = x,Y = y, £(T) = A(T)] > 0. (1.5) 

For the same reason, if there were any indices i such that £(i) ^ \(i) where (|1.3|) does 
not hold, then we would find 

E[A\\X = x,Y = y,Z{T) * X{T),Y i{T) = F A(T) ] > 0. 

In conjuction with (|1.4[) this implies 



E[A\\X = x,Y = y,t;(T)^KT)] 

> P[Y aT) jt Y X{T) \\X = x,Y = y,£,{T) ± A(T)]. (1.6) 

The proof of Theorem 11.11 exploits a generalization of these observations: Lemma 
2.31 of Section [2] shows that there exist two optimal alignments £ and A that differ 
from each other exactly in positions i where the optimal alignment of X and Y is 
not locally unique. In Section [3] we show that up to negatively exponentially small 
probability in n the following are true, 

i) approximately half the points i € N m with £(«) = X(i) satisfy Xi =/= Y\u) , 
ii) approximately half the points i e N m with £(«) ^ X(i) satisfy Y^ ^ Y\(i), 
iii) approximately a quarter of points i G N m with £(i) ^ X(i) satisfy Xi =/= 

Y m =y A(i), 

iv) approximately a quarter of points i € N m with £(i) ^ X(i) satisfy Xi = 

Y m = f a(j) > 



v) for all ei, e% € {0, 1} approximately a quarter of points i € N m satisfy X t = e\ 

and Y^i) = e 2 . 
Let T be the random index in N m that corresponds to the entry of X that is 
flipped. By the observations of Example 1 1.21 i)-iv) lead to the following generalization 
of CLU), 

E[A||X = x,Y = y, £(T) jt A(T)] > i. (1.7) 

Likewise, v) leads to the following generalization of (|1.5|) . 

E[A||X = x, Y = y, £(T) = A(T)] > 0. (1.8) 

Of course, (|1.7|) and (|1.8[) hold only approximately. Much of the work of Section [3] is 
devoted to overcoming these imprecisions. For now, let us work with the simplified 
assumption that (|1.7j) and (jl.8|) hold true except on a set F c of pairs (X, Y) with 
negatively exponentially small probability P[_F C ] = exp(— 0(n)). Equations (|1.2|) . 
([T7f|l and (fl~8l) then imply 



= E[A] > i x P[{(T) ^ A(T)] + x P[£(T) = A(T)] - 1 x P[F C ], 
so that 

PK(T) # A(T)] < exp(-0(n)). (1.9) 



When the approximate statements (jl.7j) and (|1.8j) are replaced with correct inequali- 
ties, (ll.9[) turns into the weaker claim of Theorem ll.il 

The structure of the remaining sections of this paper is as follows. Section [2] 
serves to introduce the main notation relevant to scoring functions, alignments and 
local uniqueness of alignments. We also discuss illustrative examples and prove two 
technical results of preliminary nature. In Section [3] we introduce events defined in 
terms of certain empirical distributions and their large deviations. These events allow 
putting the above-made approximate statements i)— v) onto a rigorous footing. In 
Section [4] we define formally the measure-preserving map which flips a random entry 
of X to its opposite value. In Lemma 14.11 of that section we prove that the locations 
where the optimal alignment is nonunique tend to introduce a positive bias into E[A]. 
Section \5\ finally brings all the elements together in the proof of Theorem 11.11 



2. Alignments and Scores. Let {x, y) € {0, l} m x{0, 1}™ be a pair of sequences 
of lengths m < n over the binary alphabet. Let us consider alignments 
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of x and y that have Sn gaps in x, where m = \Jl — S)n\. In the above display we 
marked gaps with the symbol U. We identify the set of such alignments with the set 
■Am,n of order-preserving injections of N m into N n := {1, . . . , n}, that is, £ € -4 m) „ if 
and only if £ : N m =-► N n and i < j implies £(i) < C OO- 
LEMMA 2.1. If 5 < 5/6, then #A m ,n < exp(nH(S)), where H(5) = -(51n5 + 
(1 — 6) ln(l — 6)) is the entropy function. 
Proof. Robbins' inequality [T9] says that 

v^ n -™+T2TT < n \< V2^n n+ ^ e-"+T^ . 



Therefore, 



* Am - n - U(i - -5) 



V2^n n +i e" 



2i(n(l - S)) n{1 - S)+i e - n(1 ~ s)+ n»d-*m x V2^(^)" 5+ 2 e^+TOT 

CX P I T2n _ 12n(l-c5) + l _ 12n5+l 



= c" ff W X 



^2^(1 -(5)(n-m) 



Note that the second factor converges to zero for fixed S and n — > oo. Moreover, the 
numerator is < 1 and for S < 5/6 the denominator is > 1 since 2ir(l — S)(n — m) > 
6(1 - <J) > 1. D 

We define a scoring function {0,1}"' x {0,1}" x A m ,n — > No as follows, 

m 

i=\ 

where s(0, 0) = s(l, 1) = 1 and s(0, 1) = s(l, 0) = 0. The set of optimal alignments of 
(x, y) is the set of alignments with maximum score, 

OA x>y := {£ e Am, n ■ S(x, y; £) > S(x, y; A) V A e A m ,n} ■ 

We write S*(x,y) := max.{S(x,y;£) : £ G A m ,n} for the maximum score, as before. 
For each i € N m we define the variable 

fl if3£,Ae04 Xi!/ s.t. ^)^A(i), 
1 otherwise 

that indicates when the image of i under the set of optimal alignments is nonuniquc. 
We say that the optimal alignment is locally nonunique at i if Ui(x, y) = 1. We write 



l (x,y) ■= ^2ui(x,y) 



for the number of indices where the optimal alignment is locally nonuniquc. 

The sets OA XiV and {i G N m : Ui(x, y) = 1} can be found via dynamic program- 
ming: Am x n matrix [score(i, j)) is recursively computed, using the rules 
r.i) score(i,j) = —1 for i > j or j > i + <5n, 
r.ii) score(l, j) = s(a;i, yj) for j = 1, . . . , 1 + 5n, 

r.iii) score(i,j) — s(xi, yj) + max{score(i — 1, k) : fc < j} for all other (i,j). 
Arguing recursively, one immediately verifies that 

S*(x,y) = max{score(m, j) : j € N„}, 

and furthermore that £ € OA x . y if and only if the following conditions are satisfied: 
c.i) £(m) e {j € N„ : score(m,j) = S*(x,y)}, 
c.ii) £(i — 1) <G {j < £(i) : score(i — l,j) = max^, < ^( i ) score(i — l,fc)} for all 

i = 2, . . . , 771. 
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Fig . 2.1. The scoring matrix of Example \2.2\ 



Example 2.2. Let x = 01010101 and y = 110101101100. Then the above de- 
scribed dynamic programming algorithm generates the matrix of Fiaure \2.1[ where it 
is displayed in tableau format. Optimal paths follow the arrows and pass through the 
shaded entries. The tableau is annotated with the generating sequences x and y, so 
that the optimal alignments can easily be read off. The following table lists y in the 
top row, followed by a complete list of optimal alignments of x with y, 
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Every line of the the tableau in Fiaure \2J\ contains multiple shaded entries. Therefore, 
Ui{x, y) = 1 for all i £ N m and u{x, y) = m. 

In the above example we ordered the optimal alignments from leftmost to right- 
most as located within the tableau, that is, alignments are listed in inverse alphabetical 
order with respect to the lateness of gaps. This also provides the idea of proof for the 
following result, which shows that there exist two optimal alignments that differ from 
one another at every point where the optimal alignment is locally nonunique. 

Lemma 2.3. For all (x, y) £ {0, l} m x {0, 1}™ there exist £, A G OA x ^ y such that 
£(i) ^ \{i) for exactly those indices i £ N m for which Ui(x,y) = 1. 

Proof. The claim is clearly true if we can prove that (,Ae OA xy , where 

£(i) := minOOO : ^ € OA x , v } Vz £ N m , 
\(i) := max {?/>(«) : "0 € OA x ^ y } Vi £ N m . 

If A ^ OA X:V then there exists an index i £ N m \ {1} such that ip(i — 1) < X(i — 1) for 
all ip £ OA X}V such that ijj(i) = X(i). On the other hand, there exists A £ OA XtV such 



that X(i — 1) = X(i — 1), and therefore it is necessarily the case that X(i) < i/)(i) = X(i) 
But A and ip satisfy condition c.ii), that is, 



X(i — 1) G \ j < X(i) : score(i — l,j) — max score(i — l,k)>, 

*■ fc<A(i) - 1 

"0(i -l)e j< ^(?) : score(i — l,j) = max score(i — 1, fc) k 



k<ili(i) 

Therefore, either max, j,., score(i — 1, fc) = max, ?,., score(i — 1, fc) and then there 

exists ^ G OA XiV such that ?/;(*) = ^(i) = A(i) and ■)/'(* — 1) = A(i — 1) = A(i — 1), or 
else max fc< ^, i , score(i— 1, fc) < max^j;,^ scoreii — 1, fc) and then •*/>(* — 1) > X(i— 1) = 
A(z — 1). In either case we have a contradiction, and this shows that A G OA x ^ y indeed. 
The proof that £ G OA XtV is analogous. D 

Note that the alignments £ and A constructed in the proof of Lemma 12.31 are 
uniquely determined by (x, y). Furthermore, they satisfy the relation £ < A which is 
defined by £(i) < A(i) for all i e N m . 

3. Large Deviations of Some Empirical Distributions. In this section we 
establish a rigorous version of the approximate inequalities (|1.7j) . (|1.8|) and statements 
i)-v) of Section [TT2l Recall that X — X\ . . . X m and Y — Y\ . . . Y n are two independent 
random sequences that consist of i.i.d. standard Bernoulli variables Xi, Yj ~ SS[\j1) 
defined on some probability space (0,=^, P). We write Ui and U for the random 
variables Ui(X, Y) and u(X, Y) respectively. Furthermore, wc think of 5 G (0, 1) as a 
fixed gap proportion that relates m to n via m = [(1 — i5)nj . 

For £ G Am,n fixed and u G 51 let &^{u>) be the empirical distribution of 
(Xj(w),Fj( i )(w)) over i G N m , i.e., the distribution of (Xt(u>), i^m(w)) when T ~ 
^(N m ) is a random index with uniform distribution on N m . Yet another way to 
define this distribution is to require that for all (ei, e.<i) G {0, l} 2 , 



P~ 



[(ei,e 2 )] = — x #|i G N m : ^(w) = ei, Y m (u) = e 2 |. 



Example 3.1. Let x,y and £ be chosen as in Example ] 1.21 If w G SI is chosen 
such that X{ui) = x and Y(w) = y then 

P ^(a,)[(°. )] =^. P^ M [(0,1)]=| P# t(w) [(1.0)] = £, P#,( W )[(M)] =| 

TVoie i/iat ^ onfo/ depends on x and y. 
Let 2?£ be the event that 



max 

(ei,e 2 )e{0,l} 2 



P*«M [(«!•««)] - V4| < J^§) = : e ^ (3- 1 ) 



in other words, 

£ £ :=Le!!: %(w) - ^(1/2) <8> ^(1/2) < ei(«J)} . 

Let us furthermore define the event 

E m ,n '■= [ I -Ef- 
«e.A m ,„ 
9 



In a similar vein we define empirical distributions and events relating to (JQ , y (j) , Y\U) ) 
as follows: Let e > and £, A e .Am, „ be fixed such that £ < A and d(£, A) := #{i : 
£(«) y^ A(i)} > me. For all i such that £(i) = A(z) we define the random variables 



R 



«,A 



1 if X % ± Y m 
-1 if Xi = Y m 



Likewise, for all i such that £(i) ^ A(z) we define the random variables 

if %) + Y X(i) > 

i?P:=<!i ifjf i ^% ) =y A(i)) 



-l 



if X: = K 



€(i) 



F: 



A(i)- 



We now introduce the following notation: 



^t 9 \ &e ( w ) ' -^f ™ 9 ( w ) an< ^ &T\ ( w ) are the empirical distributions of i?f (uj) 
over {j € N m : £(i) — A(i)}, {i € N m : £(i) 7^ A(i)} and i € N m respectively, 
jragree is the distribution on {-1,1} defined by P^-«,r«>[-l] = 1/2 = 
P^a fl r«[l], and j? disa a = ^«m/ t he distribution on {-1,0,1} defined by 
P^d„a S [-l] = 1/4 = P^ dlSQ3 [l] and P^*« oe [0] = 1/2, 
parameters 62 , £3 and £4 are determined in terms of 8 and e via the relations 



£ 2 (5, e) 



3H(5) 



e 3 (<5,e) 



27H(S) 



2(1 -5)e' ° v ' y ' y 8(1 -<J)e 
With this notation, we define the following events, 

<e 2 (<S,e)}, 



£ 4 (<5,e) 



aff(<J) 



2(l-5)(l-e)- 



F tx 


= |cj G fi : 


^agree _ gagret 


"£,A 


= |w G : 


cvicLisag disag 


#f,A 


= jw e f2 : 


cpunif aunif 
£,A cf 


?n,n,e 


= 


n fen^ 




{(£,\):£<\,n 


ie<d(£, A)<m(l-e)} 



<£3 

< £4 

n 



(8,e) + 2e) 



#f 



n — 

{(£,A):£<A,m(l-e)<d(£,A)} 



Lemma 3.2. For all 5 < 5/6 and e > 0, 

*; P[^,,„] <8e-" ff W 7 

Proof. The Azuma-Hocffding Theorem [5J Q3] says that if (Vq, . . . , K„) is a mar- 
tingale with y = and P[|Vfc - V k -i\ < a] = 1 for all fc e N TO then 



P[V m > wife] < exp 



m6^ 
2^2 



for all b > 0. For £ e An,« and (ei, e 2 ) G {0, l} 2 fixed let 

'l if(X 1 -(«),y« i) (w))=(ei,e 2 ) ) 



2i(w) := 







otherwise. 

10 



Then Zi (i E N m are i.i.d. random variables with expectation E[Z^ = 1/4. If we set 
V := and 



V k :=^2(Zi-E[Zi]) (fceN m ), 



then (Vq, . . . ,V m ) is a martingale with \Vk — 14_i| < 3/4 for all k. By the Azuma- 
Hoeffding Theorem, we have 



1 m 1 

m t— ' 4 



-Kn>ei(<5) 



< exp 



8me 2 (£) 
9 



< e -2nH(,S) 



Applying the same reasoning to the martingale (— Vq, . . . , —V m ), we find 



1 - 1 



so that 



P §M,e 2 )}-- 



> ei(6) 



< -2nH(<5) 



<2e -2nff(S) 



® s Lv ■'■■ -■_ 4 

Since this is true for all (ei, e^) G {0, l} 2 , simple union bounds show that 

P[££] ^Se- 2 "^) 
and 



P \E C 1 = P 



U E ! 



<#A„,„x8e- M ' s ' Le <' 8e~™ ff(5) 



ii) The proof of the second part is similar: Let (£, A) be such that £ < A and 
d(£, A) > me. For all i such that £(i) 7^ A(i) we have £(i) < A(i). For e G {-1,0,1} 
fixed let 



Zi'.= 



1 if i?f A = e, 
otherwise 



(i G {k : £(fc) ^ A(fc)}. 



Then we have 



E[Zi] 



i ifee{-l,l}, 



,J ife = 0., 

so that \Zi — E[Zj]| < 3/4 in all three cases. Furthermore, the random variables 
pi-Eft]), (i G {fc : £(fc) ^ X{k)}) 

are i.i.d. with distribution ^ dlsa a . This is seen by induction, using the observation 
that for all index sets 

lc{k: £(fc)^A(fc)}, 
11 



if *max — max/ then X iinax and Yxu m \ do not appear in any of the expressions 
(Xi,Y^rj\,Yxa)) (i G I\ {i mal }), so that independently of the value of Y^ im \ (which 
could have appeared in the above expressions at most once as Y^u)), we have 



1 



p [ymw) * n<w)] = 2' p [Xi — ^ y «*— } = ,A - j - 



y> 



1 

4' 



(V»x) = F A(w)J 



and P [X imax = F c 
We define y = and for k G N d(? , A) , Vfc := V fe _i + Z t - E[Z,], where 
i = min{Z eN m :#{j<l: CO') ^ Mj)} =k}. 

Then (Vb, . . . , V^^a)) is a martingale, and arguing as above by ways of the Azuma- 
Hoeffding Theorem, we find 



P^aisag [e] — P Jfdi. 



= P 



-V, t 



>e 3 (8,e) 
>e 3 (8,e) 



< 2 exp - 



Since this holds for all e <G { — 1, 0, 1}, we have 

P[Glx] <6e- 3nH ^ 

whenever d(£, A) > me and £ < A. 

If it is even the case that d(£, A) > m(l — e), then we find 



/ 8d(Z,\)el(6,e) \ 2c _ 3nH(s) 



(3.2) 



P 

and also 



Pjjaisog [e] — P j?di. 



> 64(6,6) 



< 2 exp - 



8d(Z,\)el(6,e) \ <2e - 3n H {S ) 



P £*nif [e] - P SH-isag [e] 



< 2s. 



Therefore, 



P«uM/e -P 



if\e\ 



> e 4 (<5,e) + 2e 



<P 



Pj^sag [e] - P «fd» 3 ag [e] > 6^(6, e) 



<2e -3nH(6) 



Since this holds for all e <G { — 1,0, 1}, we have 

P[Hlx] <6e" 3nHis) 



(3.3) 



whenever d(£, A) > m(l — e) and £ < A. 

Next, let £ < A be such that d(£, A) < m(l — e), and for e <G { — 1, 1} fixed let 



Z,:= 



1 ifi?P = e, 



otherwise 
Then E[Z t ] = 1/2 so that \Z t - E[Z t }\ = 1/2, and 



(iE{k: e(fc) = A(fc)}). 



(Zi - £[^]), (i G {fc : £(fc) - A(fc)}) 
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are i.i.d. random variables with distribution ^/ a 9 ree . Let Vq :— 0, and for k £ 
N TO _d(£,A), Vk '■= Vk-i + Z t - E[Zi], where 

i = min{l £ N : # { j < Z : £(j) = A(j)} = * } ■ 

Then (Vq, . . . , Kjj.-d^.A)) is a martingale and the large deviations argument from above 
shows that 






>£2(5,e) 



-V„. 



>e 2 (S,e) 



m - d(£, A) ' "'- d «> A ) 
Since this holds for both e G {1,-1}, we find 



< 2e -2(m-dK,A))el < 2 e -3nH (5) 



(3.4) 



whenever d(£, A) < m(l — e) and £ < A. 

Finally, the combination of equations (|3.2[) . (|3.3[) . (|3.4[) and Lemma [2TT1 shows 
that 



{(C>A):C<A,ms<d(£,A)<m(l-e)} 
<(An,n) 2 x(lO e - 3 " H W) 

< 10e- nH{S) . 



{(£,AK<A,m(l- E )«i(e,A)} 



D 



4. An Ergodic Map. Let us now introduce an ergodic map as follows: Let 
T ~ ^(N m ) be a uniform random variable on N m . By Kolmogorov's theorem we 
may assume without loss of generality that (f2, j^", P) is extended so that T is defined 
on Q and independent of the Xi and Y}. Let us define new random sequences X = 
Xi . . . X m and Y = Y . . . Y n by setting Y := Y, X t := X ( for all z ^ T and 



X, 



X 7 



1 mod 2. 



In other words, (X,Y) is obtained from [X, Y) by flipping one random bit of X 
and keeping all other entries of X and Y unchanged. The map (X,Y) H> (X,Y) is 
measure-preserving, since X and Y again consist of i.i.d. standard Bernoulli variables. 
Therefore, 



E[A] = 



(4.1) 



where A := S*(X, Y) — S*(X, Y). The construction in the proof of Lemma l2~3l shows 
that there exists a <r(X, Y)-measurable map 

u *-* (S w , A w ) 

such that for all ui £ fi, S w < A u and 

{t: S w (z)^A w (i)} = { i: J7<(w) = l}. 
13 



Furthermore, X and Y define the o~(X, y)-measurable events E mn , F mn and the 
<j(X, y)-measurable random variable U introduced in Section |3J The following two 
lemmas show how these objects affect A and will be the key tools in the proof of the 
main theorem of this paper. 

Lemma 4.1. For all S < 5/6 and e > 0, 



E 



A 



U > me, F„ 



> 



3 max (€4(6, e) + 2e, e 3 (<5, e)) 



x e 



3 (e 4 (M + £) , -2e 2 (M 



x(l-e). 



Proof. A key observation is that 1h(t) 7^ ^A(T) implies A = 1: Without loss of 
generality we may assume that Xt — Va(T)! so that At ^ Ys(t) an d >5*(X, V) = 
YlijtT s (-^j)^a(i))- But then we have 

S*(X, y) + 1 > 5*(X, f) > S(X, Y; 5) 

= ^ s (x i! y H(i) ) + s (x T ,Y H(T) )=5*(x,y) + i, 

so that A = 1 indeed. Likewise, Xt ^ Y S ( T ) — Y^ T ) implies A = 1. Using these 
facts and the trivial lower bound A > — 1, we have 



E 



A U>m(l-e),F m , n 

> E [l x Psm, [0] + 1 x Pjjm, [1] - 1 x Pj^ [-1] [/ > m(l - e), F„ 

-**S,A "*S,A **S»A 

> 1 x (Fvumf [0] - e 4 (<5, e) - 2e) + 1 x (P »„„«/ [1] - e 4 (<5, e) - 2e) 



1 



-1 x (P i/u „,/[-l]+e 4 (5,e) + 2e) 
3(e 4 (<5,e) + 2e), 



(4.2) 



E 



A m(l -£)>[/ > me, F m ,„, S(T) ^ A(T) 
>E[lx Psa^JO] + 1 x Ps8 <MS [l] -1 x Ps« M J-l] m(l -e)>U >me, 

-^S,A -^S,A ^B.A 

F m , n ,S(T)^A(T) 

> 1 x (P^«,. 8 [0] - ea(M) + 1 x (P./*.-.!!] - c 3 (*,e)) 

-lx (P </d „ ag [-l]+ e 3 (<5,e)) 



3e 3 (6,e) 



(4.3) 



E 



A m(l -e)>U> me, F m ,„,S(T) = A(T) 

> E [l x P s»„,. [1] - 1 x P sr^e [-1] m(l - e) > C/ > me, F m .„, S(T) = A(T) 

> 1 x (P^..™. [1] - e 2 {6, e)) - 1 x (P^.,™. [-1] + e 2 {5, e)) 



= -2e 2 (*,e), 



(4.4) 
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Putting the pieces together, we find 



E 



A 
> E 

+ E 

+ E 



U > me, F m ,„ 




A 


U > m(l 


£ ) i "m,n 


x (p 




+ p 


U> r 



U > m(l - e),S(T) = A(T) 



[/ > me, F„ 



[/ > m£,F m ,n 



m(l - e) > [/ > me, F m „, H(T) ^ A(T) 



x P 



m(l - e) > [/ > me, S(T) ^ A(T) 



m(l - e) > [/ > me, F m , n , S(T) = A(T) 



x P 



m(l - e) > [/ > me, 5(T) = A(T) 



H31i,B3t,101 
> 



— 3 max (e4((5, e) + 2e, £3(5, e)) 



x P 



S(T) ^ A(T) 



£/ > me, F„ 



[/ > me,F min 



U > me, F m „ 



1 



3(e 4 (S,e) + 2e),-2e 2 (S,e) 



> 



— 3 max (64(6, e) + 2e, €3(6, e)) 



x e + min 



x P 
1 



S(T) = A(T) 



f7 > me, F m n 



-3(e 4 (<5,e) + 2e),-2e 2 (<5,e) 



x(l-e). 



Lemma 4.2. For anj/ a (X,Y) -measurable event B and all 5 < 5/6, we have 
f AdP>-4ei(,5)-8e- nH ^. 

J B 

Proof. 



AdP> AdP-P[E^ n ] >' / AdP-8e-" H(,5) . (4.5) 

J B J BnE„, „ J BnE„, „ 



Lentil 

'S JBnE m: „ JBnE m: „ 

Clearly, for all w S CI, 

AH > S(X(w), F(w); S„) - S(X(w), F( w ); H w ) . 
Therefore, 



AdP > 

BnB m ,„ JBnE m: „ 



S[X,Y;E)-S(X,Y;E) 



dP 



BnE„ 



E 



BnE m: „ 



03J 
> 



(X T ,F H(T) ) -s(Xr,F E(T) )] dP[(X(u),Y{u),T(u))] 
s (x T , F H(T) ) - s (X T , F H(T) ) j) (X, F)] dP [(X(w), F(w))] 
-lx4ei(<5)dP[(X(u;),F(w))]>-4e 1 (<5). 



BnE„ 



Together with (I4.5[) this implies the claim. □ 

15 



5. Proof of The Main Theorem. After introducing the tools of Sections [3] 
and 21 the stage is set for a proof of Theorem 11.11 

Proof. Since {w : U < me} U F^ n is a(X, Immeasurable, we have 



= E[A] = E 



A 



U > me, F„ 



LerM~T\Len\4~2\ 

xP[U>me,F m . n ]+ I AdP > 

{U<me}UF^ n 



> 



— — 3 max {ti{8, e) + 2e, €3(6, e)) 



x(P[U>me}-P[F^ n ]) 
-(4e 1 (5) + 8c-" H ^) . 

Therefore, 



x e + mm 



i-3(e 4 (M + 2£),-2e 2 (5,e) 



x(l-e) 



P [U > me] 



< 



4ei(<5) + 8e-"- ff (' 5 ) 



[i - 3 max (e 4 (S, e) + 2e, e 3 (S, e))] x e + min [\ - 3 (e 4 (^ e) + 2e) , -2e 2 (<5, e)] x (1 - e) 



+ 10 e -nH{5) 



o 



(rt) 



0(e)+0(5 



JY + O (e-" 5 ) 
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